Search CORE

15 research outputs found

Practical algorithms for biological sequence analysis:methods and applications

Author: Retha Ahmad
Publication venue
Publication date: 01/06/2019
Field of study

King's Research Portal

Erratum to: Circular sequence comparison:Algorithms and applications

Author: Grossi Roberto
Iliopoulos Costas S.
Mercas Robert
Pisanti Nadia
Pissis Solon P.
Retha Ahmad
Vayani Fatima
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

[This corrects the article DOI: 10.1186/s13015-016-0076-6.]

Crossref

Springer - Publisher Connector

Archivio della Ricerca - Università di Pisa

PubMed Central

King's Research Portal

Circular sequence comparison: algorithms and applications

Author: Ahmad Retha (7168871)
Costas S. Iliopoulos (7168862)
Fatima Vayani (7168874)
Nadia Pisanti (7168865)
Robert Mercas (2835212)
Roberto Grossi (7168859)
Solon P. Pissis (7168868)
Publication venue
Publication date: 01/01/2016
Field of study

Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Results: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art

Loughborough University Institutional Repository

libFLASM: a software library for fixed-length approximate string matching

Author: Ayad Lorraine A.K.
Pissis Solon P.
Retha Ahmad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/11/2016
Field of study

BACKGROUND: Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time [Formula: see text] and space [Formula: see text] under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere. RESULTS: We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds. CONCLUSIONS: Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable

Springer - Publisher Connector

PubMed Central

King's Research Portal

Generalised Implementation for Fixed-Length Approximate String Matching under Hamming Distance and Applications

Author: Pissis Solon P.
Retha Ahmad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Crossref

King's Research Portal

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Author: Pissis Solon P.
Retha Ahmad
Publication venue
Publication date: 01/01/2018
Field of study

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm^2+N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N * ceil[m/w]) with pre-processing time and space O(m * ceil[m/w]), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M. A straightforward generalization of the existing bit-vector algorithm would require time O(N * ceil[M/w]) with pre-processing time and space O(M * ceil[M/w]), which is prohibitive in practice. We present a new on-line O(N * ceil[M/w])-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population\u27s variants are considered

Dagstuhl Research Online Publication Server

King's Research Portal

Circular sequence comparison: algorithms and applications

Author: Grossi Roberto
Iliopoulos Costas S.
Mercas Robert
Pisanti Nadia
Pissis Solon P.
Retha Ahmad
Vayani Fatima
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Erratum to this article has been published in Algorithms for Molecular Biology 2016 DOI : 10.1186/s13015-016-0084-6 WOS : 000381712000001 PMID : 27471546International audienceBackgroundSequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences.ResultsIn this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art

Loughborough University Institutional Repository

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

PubMed Central

King's Research Portal

ProdInra

Hal-Diderot

On-Line Pattern Matching on Similar Texts

Author: Grossi Roberto
Iliopoulos Costas S.
Liu Chang
Pisanti Nadia
Pissis Solon P.
Retha Ahmad
Rosone Giovanna
Vayani Fatima
Versari Luca
Publication venue
Publication date: 01/01/2017
Field of study

Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts\u27 representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

King's Research Portal

Accurate and efficient methods to improve multiple circular sequence alignment

Author: Barton Carl
Iliopoulos Costas S.
Kundu Ritu
Pissis Solon P.
Retha Ahmad
Vayani Fatima
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2015
Field of study

King's Research Portal

Circular Sequence Comparison with q-grams

Author: Grossi Roberto
Iliopoulos Costas S.
Mercaş Robert
Pisanti Nadia
Pissis Solon
Retha Ahmad
Vayani Fatima
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audienceSequence comparison is a fundamental step in many important tasks in bioinformatics. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular genome structure is a common phenomenon in nature, a caveat of specialized alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. In this paper , we introduce a new distance measure based on q-grams, and show how it can be computed efficiently for circular sequence comparison. Experimental results, using real and synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

King's Research Portal